Visual Attributes in Videos

نویسندگان

  • Marielle Morris
  • Mahdi Kalayeh
چکیده

Attributes are descriptive characteristics with crosscategory applicability. They can be used to provide additional information that is not found in simple object classification, giving the computer a complete picture of the item and the state in which it appears. While attribute prediction has been studied extensively in still images, this is not the case in videos. However, there exist visual clues in a video that are not present in a still frame. In this paper, we discover and annotate visual attributes for the YoutubeBoundingBoxes dataset. This is the largest video attribute dataset to date. Using the Youtube-Attributes dataset, a model can be trained for scalable multi-label classification. We have collected a dataset of 20,000 videos paired with over 85 attribute annotations describing 10 different classes of objects. We demonstrate that our dataset can be used to produce attribute classifiers that are both accurate and highly scalable.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Visual Graph Mining

In this study, we formulate the concept of “mining maximal-size frequent subgraphs” in the challenging domain of visual data (images and videos). In general, visual knowledge can usually be modeled as attributed relational graphs (ARGs) with local attributes representing local parts and pairwise attributes describing the spatial relationship between parts. Thus, from a practical perspective, su...

متن کامل

Structured Label Inference for Visual Understanding

Visual data such as images and videos contain a rich source of structured semantic labels as well as a wide range of interacting components. Visual content could be assigned with fine-grained labels describing major components, coarse-grained labels depicting high level abstractions, or a set of labels revealing attributes. Such categorization over different, interacting layers of labels evince...

متن کامل

Language Guided Visual Perception

People typically learn through exposure to visual stimuli associated with linguistic descriptions. For instance, teaching visual concepts to children is often accompanied by descriptions in text or speech. This motivates the question of how this learning process could be computationally modeled. In this dissertation we explored three settings, where we showed that combining language and vision ...

متن کامل

Compressed-Sampling-Based Image Saliency Detection in the Wavelet Domain

When watching natural scenes, an overwhelming amount of information is delivered to the Human Visual System (HVS). The optic nerve is estimated to receive around 108 bits of information a second. This large amount of information can’t be processed right away through our neural system. Visual attention mechanism enables HVS to spend neural resources efficiently, only on the selected parts of the...

متن کامل

Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

We propose a new zero-shot Event Detection method by Multi-modal Distributional Semantic embedding of videos. Our model embeds object and action concepts as well as other available modalities from videos into a distributional semantic space. To our knowledge, this is the first Zero-Shot event detection model that is built on top of distributional semantics and extends it in the following direct...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017